Language model adaptation using cross-lingual information
نویسندگان
چکیده
The success of statistical language modeling techniques is crucially dependent on the availability of a large amount training text. For a language in which such large text collections are not available, methods have recently been proposed to take advantage of a resource-rich language, together with cross-lingual information retrieval and machine translation, to sharpen language models for the resource-deficient language. In this paper, we describe investigations into such language models for an automatic speech recognition system for Mandarin Broadcast News. By exploiting a large side-corpus of contemporaneous English news articles to adapt a static Chinese language model to the news story being transcribed, we demonstrate significant improvements in recognition accuracy. The improvement from using English text is greater when less Chinese text is available to estimate the static language model. We also compare our cross-lingual adaptation to monolingual topic-dependent language model adaptation, and achieve further gains by combining the two adaptation techniques.
منابع مشابه
An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation
This paper provides an in-depth analysis of the impacts of language mismatch on the performance of cross-lingual speaker adaptation. Our work confirms the influence of language mismatch between average voice distributions for synthesis and for transform estimation and the necessity of eliminating this mismatch in order to effectively utilize multiple transforms for cross-lingual speaker adaptat...
متن کاملTwitter Translation using Translation-Based Cross-Lingual Retrieval
Microblogging services such as Twitter have become popular media for real-time usercreated news reporting. Such communication often happens in parallel in different languages, e.g., microblog posts related to the same events of the Arab spring were written in Arabic and in English. The goal of this paper is to exploit this parallelism in order to eliminate the main bottleneck in automatic Twitt...
متن کاملSemi-Supervised Representation Learning for Cross-Lingual Text Classification
Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification ...
متن کاملExplorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis
In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...
متن کاملCross-lingual speaker adaptation via Gaussian component mapping
This paper is focused on the use of acoustic information from an existing source language (Cantonese) to implement speaker adaptation for a new target language (English). Speakerindependent (SI) model mapping between Cantonese and English is investigated at different levels of acoustic units. Phones, states, and Gaussian mixture components are used as the mapping units respectively. With the mo...
متن کامل